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ABSTRACT 

We discuss the theoretical machinery involved in predicting financial market movements using an artificial mar- 
ket model which has been trained on real financial data. This approach to market prediction - in particular, 
forecasting financial time-series by training a third-party or 'black box' game on the financial data itself - was 
discussed by Johnson et al.^*^' and was based on some encouraging preliminary investigations of the dollar- yen 
exchange rate, various individual stocks, and stock market indices. ^■^ However, the initial attempts lacked a 
clear formal methodology. Here we present a detailed methodology, using optimization techniques to build an 
estimate of the strategy distribution across the multi-trader population. In contrast to earlier attempts, we are 
able to present a systematic method for identifying 'pockets of predictability' in real-world markets. We find that 
as each pocket closes up, the black-box system needs to be 'reset' - which is equivalent to saying that the current 
probability estimates of the strategy allocation across the multi-trader population are no longer accurate. In- 
stead, new probability estimates need to be obtained by iterative updating, until a new 'pocket of predictability' 
emerges and reliable prediction can resume. 

Keywords: Econophysics, Multi- Agent Games, Kalman Filter 

1. INTRODUCTION 

Judging from the literature, in particular the wide range of popular finance books, the possibility of predicting 
future movements in financial markets ranges from significant (see, for example, the many books on chartism) 
to impossible.^"* Another scenario of course does exist - that financial markets may neither be predictable or 
unpredictable all the time, but may instead have periods where they are predictable (i.e. non-random) and 
periods where they are not (i.e. random). Evidence for such 'pockets of predictability' were found several years 
ago, by Johnson et al.**^ A similar study was reported subsequently by Sornette et al.^ However, a formal 
report of a theoretical framework for identifying such periods of predictability has not appeared in the literature 
to date. 

The rationale behind our initial proposal to predict financial markets using artificial market models, is as 
follows. Financial markets produce time-series, as does any dynamical system which is evolving in time, such as 
the ambient air-temperature, or the electrical signals generated by heart-rhythms. In principle, one could use 
any time-series analysis technique to build up a picture of these statistical fiuctuations and variations - and then 
use this technique to attempt some form of prediction, either on the long or short time-scale. One example would 
be to use a multivariate analysis of the prices themselves in order to build up an estimate of the parameters 
in the multivariate expansion, and then run this model forwards. However, such a multivariate model may not 
bear a relation to any physical representation of the market itself. Instead, imagine that we are able to identify 
an artificial market model which seems to produce the aggregate statistical behavior (i.e. the stylized facts) 
observed in the financial market itself. It now has the additional advantage that it also mimics the microscopic 
structure of the market, i.e. it contains populations of artificial traders who use strategies in order to make 
decisions based on available information, and will adapt their behavior according to past successes or failures. 
All other things being equal, we believe that such a model may be intrinsically 'better' than a purely numerical 
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multivariate one - and may even be preferable to many more sophisticated models such as certain neural network 
approaches, which also may not be correctly capturing a realistic representation of the microscopic details of a 
physical market. The question then arises as to whether such an artificial market model could be 'trained' on 
the real market in order to produce reliable forecasts. 

Although in principle one could attempt to train any artificial market model on a given financial time-series, 
each of the model parameters will need to be estimated - and if the model has too many parameters, this will 
become practically impossible as the model will become over-determined. For this reason, our own (and Sornette 
et al.^) attempts have focused on training a minimal artificial market model. We had already shown that a 
minimal model could be built from a binary multi-agent game in which agents are allowed to not participate if 
they were not sufficiently confident.^" 

Here we focus on the basic Minority Game where all agents trade at every time-step, since we are interested in 
describing in detail the parameter estimation process as opposed to creating the best possible market prediction 
model. In particular, there is no reason to expect that (i) the Minority Game's pay-off structure whereby only 
the minority group gets rewarded, or (ii) the Minority Game's traditional feature whereby all agents have the 
same memory time-scale m, are either realistic or optimal. We provide a more complete discussion of suitable 
pay-off structures in artificial financial markets in previous work.*'^ For the present discussion, we retain both 
these features since they do not affect the formalism presented - however we note that recent work by Mitman 
et al.^^ and subsequent work by Guo^ have shown that allowing agents to have multiple memory values does 
indeed lead to improved performance both overall and for specific individual traders. 

Binary Agent Resource games, such as the Minority Game and its many extensions, have a number of real- 
world applications in addition to financial markets. For example, they are well-suited to studying traffic flow in 
the fairly common situation of drivers having to choose repeatedly between two particular routes from work to 
home. In these examples, and in particular the financial market setting, one would like to predict how the system 
will evolve in time given the current and past states. In the context of the artificial market corresponding to 
the Minority Game and its generalizations, this ends up being a parameter estimation problem. In particular, it 
comes down to estimating the composition of the heterogeneous multi-trader population, and specifically how this 
population of traders is distributed across the strategy space. Here we investigate the use of iterative numerical 
optimization schemes to estimate these quantities, in particular the population's composition across the strategy 
space - we will then use these estimates to make forecasts on the time-series we are analyzing. Along with these 
forecasts, we also need to find a covariance matrix in order to determine the certainty with which we believe our 
forecast is correct. Such a covariance matrix is important for a number of reasons including risk analysis. 

Given the forecast and its associated covariance matrix, we will also need to decide whether to use the forecast 
or throw it away based on the covariance matrix (which represents the expected errors on the forecast). We 
discuss this point, and in so doing we will see that the system can fall into 'pockets of predictability' during which 
the system becomes predictable over some significant time-window. In the rest of this paper, we discuss these 
ideas and apply them to simulated and real financial market data, in order to identify pockets of predictability 
based on the model. 

2. PARAMETERIZING THE ARTIFICIAL MARKET MODEL 

2.1. A Binary Agent Resource Game 

In a Binary Agent Resource game, a group of N agents compete for a limited resource and each of them takes 
a binary action. Let's denote this action at time step k for agent i by ai(rjfe_i), where the action is in response 
to a global information set defined by Vlk-\ consisting of all information up to time; step k — 1 available to all 
agents. For each time step, there will exist a winning decision based on the action of the agents. This winning 
decision Wk will belong to the next global information set ilk and will be available to each agent in the future. 

2.2. The Minority Game 

A particularly simple (indeed, possibly over-simplified) example of a Binary Agent Resource game is the Minority 
Game, which was proposed in 1997 by Challet and Zhang as a very specific game which highlights the competitive 
effects inherent in many complex adaptive systems. Since then, many variants of this game have been posed 
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Memory 


Decision 


-1,-1 


1 


-1, 1 


-1 


1,-1 


1 


1, 1 


1 



Table 2.1. The left column shows the 2^ = 4 possible memory strings and the right column shows the strategy's response 
to each memory string. 

with slight modifications to the original. Here we focus on the original Minority Game for the purposes of our 
examples, though we note that the optimization formalism which we present also applies to other variants of the 
game. 

Let us first provide the motivation for using the Minority Game to forecast financial markets. Essentially, in 
financial markets, agents compete for a limited resource, which gives us the minority nature we arc interested 
in. For example, if the minority group is selling, the majority group will force the price up at the following time 
step because of the greater demand. At the exact same time, the minority group will sell at the overvalued price 
and gain profit. 

We start the game with a group of agents, each of which holds an assigned strategy. Let a given strategy 
have memory size m meaning it contains a binary decision for each of the 2™ possible binary strings of length 
m. An example of a strategy with m = 2 is given in Table 2.1. 

A given strategy will look back at the m most recent bits of the winning decisions. At time step k, this would 

be {wk-ra, ■ ■ ■ , Wk-i)- The strategy will then pick the corresponding binary decision. Using our example strategy 
in Table 2.1, if Wk-2 = —1 and Wk-i = 1, the strategy would make the decision —1. Before we can explain how 
the strategy works, we must also define the time horizon, a bit string consisting of the T most recent bits of the 
winning decisions where T is significantly larger than m. For example, at the beginning of time step k, the time 
horizon would be {wk-T, ■ ■ ■ , Wk-i)- 

In order for the game to be interesting, we assume some agents hold more than one strategy in what we call 
their strategy set. We now need to define how the agents should choose amongst their strategies. Each agent 
scores each of their strategies over the time horizon by giving it one point if it would have predicted correctly 
and taking one away if it would have predicted incorrectly at each time step in the past. For example, if the 
time horizon was (—1, 1, —1, —1, 1), we would assign -1-1 — 1 + 1 = +1 points to our strategy in Table 2.1 since 
the first decision on (—1, 1) to choose —1 would have been correct, while the second decision would have been 
incorrect and the third one correct again. In this way, we could score all of the agents' strategies, and the agent 
will simply pick the highest scoring strategy to play. The winning decision at time step k, Wk, is the minority 
decision made by the agents as a whole. 

For ties between the scores of an agent's strategies, the agent will simply toss a fair coin to decide. Further, 
if there is a tie in the winning decision over all agents (i.e. an equal number of agents picked -1 and 1), we can 
again toss a fair coin to decide. Both of these, together, inject stochasticity into the system. As mentioned in the 
introduction, there are a number of variations on ways to score strategies that can be looked at. In this paper, 
we stick to the basic Minority Game structure. 

We are now interested in the time-series generated by the aggregate actions of the agents. We start the series 
at tq = 0, where r stands for returns, and allow to evolve by = rk-i '^i(^fc-i): where ai{D,k-i) denotes 
the response of agent i at time k to global information set ^k-i- For the Minority Game, this response is simply 
the decision made by agent i, and flk-i consists only of the time horizon at time k — 1. Again, agent i makes 
this decision by choosing the highest scoring strategy over the time horizon as explained above. 

Further, let's define the difference series of the returns series as zo = with Zk = Vk — rk-i = J2i^ii^k-i)- 
The difference series is the series we will estimate throughout this paper. It is trivial to find the returns series 
given the difference series. In terms of the difference series, we can also define the winning decisions and the 
time horizon, which we provide here for completeness. The winning decision at time step k, Wk can be defined 
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by Wk = -sgn(zfe), where sgn(fc) represents the sign function, which is 1 for positive k, —1 for negative k, and 
otherwise. This simply states that when Zk is positive, the minority chose —1 and vice versa. Note that Zk will 
never be since we toss a fair coin for ties. As before, the time horizon is defined by the winning decisions. At 
time step k, this would simply be [wk-x, ■ ■ ■ , Wk-i)- 

For a more thorough introduction to the Minority Game and other variations and extensions, there are a 
number of available resources.^ 

2.3. The Parameter Estimation Problem for Forecasting 

Generally speaking, we would expect that the types of data we might be able to forecast using the Minority 
Game would have also been generated by something similar to a Minority Game. If the real market in question 
corresponds to a mixed majority-minority game, for example, then clearly it makes sense to attempt matching 
the real-data to such a mixed version of the game. This point will be explored in another publication. Here we 
focus on the Minority Game as a specific example. 

There are many parameters in the Minority Game, so we have to choose a way to parameterize the game. 
In this paper, we fix m and T for all agents. We also say each agent possesses exactly two strategies. Next, we 
remove the parameter N specifying the number of agents, by allowing this to tend towards infinity and instead 
looking at the probability distribution over all possible strategy sets with two strategies of memory size m. For 
example, if m = 1, there are 2^ = 4 strategies with a memory size of one (there are 2"* possible bit strings of 
length m and 2 responses to each bit string). So there are (2) = 6 distinct pairs of strategies with memory size 
TO = 1. The parameter space we would like to estimate is the probability distribution over these six possible 
strategy sets. Notice that we make a number of assumptions here to decide which parameters to estimate. Some 
of these assumptions can be relaxed a bit to create a larger parameter space if desired. 

In this paper, we will provide a mechanism to estimate the probability distribution over a set of strategies. 
We will call this probability distribution the state Xk at time step k. In the previous paragraph, we mentioned a 
scheme to estimate the six pairs of m = 1 strategies. In this case, we were assuming all six are strategy sets that 
were played when generating the time-scries. However, we could also choose to estimate a strategy set with more 
than two strategies per agent (or maybe even just one) and each of the strategies in the set can have different 
memory lengths if desired (note that we may like to modiiy the scoring scheme for mixed memory sizes). In this 
case, we would assume that this set of A'' mixed memory and mixed strategy length strategy sets were played 
when generating the time-series. 

Notice that this estimation problem is an inequality constrained problem with the constraints that each 
probability of playing a given strategy set must be greater than or equal to and the probabilities must sum up 
to 1. 

Section 3 and Section 4 discuss some of the technicalities of the optimization problem. Section 5 and Section 
6 provide some examples with results. 

3. ITERATIVE OPTIMIZATION METHODS TO SOLVE PARAMETER 

ESTIMATION PROBLEMS 

We will now look at iterative (recursive) schemes to solve time-dependent parameter estimation problems. We 

desire iterative schemes for a number of reasons. They provide a forecast in only one pass of the data. This 
means the algorithm can be online and can quickly make forecasts as new measurements are observed. The 
iterative schemes we discuss also provide us with error bounds on forecasts so we can determine when we are in 
a state of high predictability (or if we ever attain such a state). This is important for financial data since risk is 
always an important factor. The first method we shall discuss is the Kalman Filter. 

3.1. Kalman Filter 

A Kalman Filter is simply an iterative least-squares scheme that attempts to find the best estimate at every 
iteration for a system governed by the following model: 

Xk = ^k,k-iXk-i+Uk, Mfe ~ -/V(0,(5fe,fe-i) (3.1.1) 
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Zk = HkXk+Vk, Vk^N{0,Rk) (3.1.2) 

Here Xk represents the true state of the underlying system. represents the matrix used to make the 

transition from state Xk-i to Xk- The variable Zk represents the measurement (or observation). Hk is the matrix 
that takes the state into measurement space. The variables Uk and Vk are both noise terms which are normally 
distributed with mean and variances Qk,k~i a-nd Rk, respectively. 

The Kalman Filter will at every iteration make a prediction for Xk which we denote by Xk\k-i- We use the 
notation k\k — 1 since we will only use measurements provided until time step fc — 1 to make the prediction at 
time k. We can define the state prediction error Xk\k-i as the difference between the true state and the state 
prediction. 

Xk\k-i =Xk- Xk\k-i (3.1.3) 

In addition, the Kalman Filter will provide a state estimate for Xk given all the measurements provided up to 
and including time step k. We denote these estimates by Xk\k- We can similarly define the state estimate error 
by 

Xk\k=Xk-Xk\k (3.1.4) 

Since we assume Uk is normally distributed with mean 0, we make the state prediction simply by using ^k,k-i 
to make the transition. This is given by 

Xk\k-i = ^k,k-iXk-i\k-i (3.1.5) 

We can also calculate the associated covariance for the state prediction, which we call the covariance predic- 
tion. This is actually just the expectation of the outer product of the state prediction error with itself. This is 
given by 

Pk\k-i = ^k,k-iPk-i\k-i^k,k-i + Qk,k-i (3.1.6) 

Notice that we use the prime notation on a matrix throughout this paper to denote the transpose of that 
matrix. Now we can make a prediction on what we expect to see for our measurement, which we call the 
measurement prediction by 

Zk\k-i = HkXk\k-i (3.1.7) 

The difference between our true measurement and our measurement prediction is often times called the 
innovation (or measurement residual). We will use the term innovation throughout this paper, and we calculate 
this by 

i'k = Zk- Zk\k-i (3.1.8) 
We can also calculate the associated covariance for the innovation, which we call the innovation covariance. 



by 



Sk — HkPkik-iH'k + Rk (3.1.9) 



Next, we will calculate the Kalman Gain, which lies at the heart of the Kalman Filter. This essentially tells 
us how much we prefer our new measurement over our measurement residual. We calculate this by 

Kk = Pkik-iH'kS^' (3.1.10) 
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Using the Kalman Gain and the innovation, we update the state estimate. If we look carefully at the following 
equation, we are essentially taking a weighted sum of our state prediction with the Kalman Gain multiplied by 
the innovation. So the Kalman Gain is telling us how much to "weight in" information contained in the new 
measurement. We calculate the updated state estimate by 

Xk\k = Xk\k-i + KkVk (3.1.11) 

Last but not least, we calculate the updated covariance estimate. This is actually just the expectation of the 

outer product of the state error estimate with itself. Here we will give the most numerically stable form of this 
equation, as this form prevents loss of symmetry and best preserves positive definiteness 

Pk\k = {I- KkHk)Pk\k-i{I - KkHkY + KkRkKl (3.1.12) 

The covariance matrices throughout the Kalman Filter give us a way to measure the uncertainty of our state 
prediction, state estimate, and the innovation. Also, notice that the Kalman Filter is recursive, and we require 
an initial estimate a;o|o ^'^d associated covariance matrix Pq\q. Here we simply provided the equations of the 
Kalman Filter without derivation. For a more thorough understanding of the Kalman Filter, there are a number 
of available resources.^ 

3.2. Constrained Iterative Optimization Methods 

A Kalman Filter would certainly be the correct tool for the parameter estimation problem described in 2.3 if we 

were interested in an iterative solution and did not have any equality and inequality constraints. However, note 
that we have the following constraints on our states at each time step that make this a constrained problem: 

^Xi^k\k = '^ SMdxi^k\k>^,^i (3.2.1) 

i 

Here .Xi_fe|fc is the z-th clement of .ifeife, which represents the single probability of using a certain strategy set 
at time step k. Since we would like to use an iterative scheme, we must now think of a different method which 
acts as a Kalman Filter but allows for equality and inequality constrained optimization. In Section 3.3, we will 
introduce a method for solving equality constrained problems iteratively in a Kalman Filter like manner. From 
here we will make the extension to inequality constrained problems in Section 3.4. 

3.3. Nonlinear Equality Constraints 

Let's add to our model given by equations (3.1.1) and (3.1.2) the following smooth nonlinear equality constraints 

ek{xk) = (3.3.1) 

Notice that our constraints provided in equation (3.2.1) are actually linear. We present the nonlinear case 
for further completeness here. We now rewrite the problem we would like to solve where we use the superscript 
c to denote constrained. We should also rephrase the problem we would like to solve now. We are given the last 
prediction and its covariance, the current measurement and its covariance, and a set of equality constraints and 
would like to make the current prediction and find its covariance matrix. 

Let's write the problem we are solving as 

zt = hl{xk)+vl, vlr^N{0,Rl) (3.3.2) 

Here z^, h^, and v'f, arc all vectors, each having three distinct parts. The first part will represent the prediction 
for the current time step, the second part is the measurement, and the third part is the equality constraint, z'^ 
effectively still represents the measurement, with the prediction treated as a "pseudo-measurement" with its 
associated covariance. 
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Zk 





l\k-l 



(3.3.3) 



The matrix ht takes our state into the measurement space as before 



Xk 

HkXk 



(3.3.4) 



Notice that by combining equations (3.1.3) and (3.1.4), we can rewrite the state error prediction as 



Xk\k-1 = ^fe,fe-l5fc-l|fe-l +Uk-1 

Now we can define again as the noise term using equation (3.3.5). 



(3.3.5) 



-^k,k-lXk-l\k-l 
Vk 





Uk-l 



(3.3.6) 



And v% will be normally distributed with mean and variance Rf.. The diagonal elements of Rf, represent 
the variance of each clement of v'f,. We define the covariance of the state estimate error at time step k as Pk\k- 
Notice also that i?^ contains no off diagonal elements. 



^k,k-lPk-l\k-l^k,k-l' + Qk,k-1 

Rk 





(3.3.7) 



This method of expressing our problem can be thought of as a fusion of the state prediction and the new 
measurement at each iteration under the given equality constraints. Much like when we showed the Kalman 
Filter, we will simply write the solution here.^' 



Xk\k,j = [ / 



Hi 



k,j 





^ki^k\k,j-l) + ^k,j^k\k,j-l 





(3.3.8) 



Notice the wc use the notation on a matrix throughout this paper to denote the pseudo-inverse of that 
matrix. This method significantly differs from a Kalman Filter. In this method we are iterating over a dummy 
variable j within each time step imtil wc fall within a predetermined convergence bound j — ife|fej-i| < Ck 
or hit a chosen number of maximum iterations. We initialize our first iteration as Xk\k,o = Xk-i\k-i and use the 
final iteration as Xk\k = Xk\k,j where J represents the final iteration. 

Also, notice that we allowed the equality constraints to be nonlinear. As a result, we define H^j = 

§^(^fe|fe,j-i) which gives us a local approximation to the direction of /i^. 

Wc actually find a stronger form for this solution,'''' where i?^ will reflect the tightening of the covariance 
for the state prediction based on the new estimate at each iteration of j. We do not tighten the covariance 
matrix within these iterations here, since in our form, we can actually change the number of equality constraints 
between iterations of j. We will find this useful in the next section. Not tightening the covariance matrix in this 
way is reflected in a larger covariance matrix for the estimate as well. This covariance matrix is calculated as 



Pk\k,j — 



[0 I] 



HkJ 







(3.3.9) 
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Notice that for faster computation times, we need only calculate Pk\k,j for the final iteration of j. Further, 
if our equality constraints are in fact independent of j, we can calculate H^ j only once for each k. This would 
also imply the pseudo-inverse in equation (3.3.8) can be calculated only once for each k. 

This method, while very different from the Kalman Filter presented earlier, provides us with an estimate 
Xk\k and a covariance matrix for the estimate Pk\k at each time step similar to the Kalman Filter. However, this 
method allowed us to incorporate equality constraints. 

3.4. Nonlinear Inequality Constraints 

We will now extend the equality constrained problem to an inequality constrained problem. To our system given 
by equations (3.1.1), (3.1.2), and (3.3.1), we will also add the smooth inequality constraints given by 

hixk) > 0. (3.4.1) 

Our method will be to keep a subset of the inequality constraints active at any time. An active constraint is 
simply a constraint that we treat as an equality constraint. An inactive constraint we will relax (ignore) when 
solving our optimization problem. After, solving the problem, we then check if our solution lies in the space 
given by the inequality constraints. If it doesn't we start from the solution in our previous iteration and move in 
the direction of the new solution until we hit a set of constraints. For the next iteration, this set of constraints 
will be the new active constraints. 

Wc formulate the problem in the same way as before keeping equations (3.3.2), (3.3.3), (3.3.6), and (3.3.7) 
the same to set up the problem. However, we replace equation (3.3.4) by 



(3.4.2) 



II j represents the set of active inequality constraints. Notice that while we keep equations (3.3.3), (3.3.6), 
and (3.3.7) the same, these will need to be padded by additional zeros appropriately to match the size of I'^j. 
Now we solve the equality constrained problem consisting of the equality constraints and the active inequality 
constraints (which we treat as equality constraints) using equations (3.3.8) and (3.3.9). However, let's call the 
solution from equation (3.3.8) •i^ij, ^ since wc have not checked if this solution lies in the inequality constrained 
space yet. In order to check this, we find the vector that we moved along to reach This is simply 

= ^l\k,j - Xk\k,j-i (3.4.3) 

We now iterate through each of our inequality constraints to check if they are satisfied. If they are all satisfied, 
we choose imax = 1, and if they are not, we choose the largest value of tmax such that Xk\k,j-i +tmaxrf hes in the 
inequality constrained space. We choose our estimate to be 

^k\k,j = Xk\k,j-l +imax(^ (3.4.4) 

We also would like to remember the inequality constraints which are being touched in this new solution. 
These constraints will now become active for the next iteration and lie in Ik^j^i- Note that Z^q = lk-i^ji where 
J represents the final iteration of a given time step. 

Note also that we do not perturb the error covariance matrix from equation (3.3.9) in any way. Under the 
assumption that our model is a well-matched model for the data, enforcing inequality constraints (as dictated 
by the model) should only make our estimate better. Having a slightly larger covariance matrix is better than 
having an overly optimistic one based on a bad choice for the perturbation.^* Perturbing this covariance matrix 
correctly may be investigated in the future. 



hk{xk) = 



Xk 
HkXk 

ek{xk) 
lliixk) 
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4. COVARIANCE MATCHING TECHNIQUES 

In many applications of Kalinan Filtering, the process noise Qk.k~\ and measurement noise Rk are Icnown. 
However, in our application we are not provided with this information a priori so we would like to estimate 
them. These can often times be difficult to approximate especially when there is a known model mismatch. We 
will present one possible method to approximate these. We choose to match the process noise and measurement 
noise to the past innovation (residual) process. 

4.1. Determining the Process Noise and Measurement Noise 

In addition to estimating Qk,k-i and Rk, we will also estimate the innovation covariance Sk- We can actually 
determine the innovation covariance from equation (3.1.9), but estimating it using covariance matching to the 
past innovation process can provide us with a more accurate innovation covariance. 

We estimate Sk by taking a window of size Nk (which is picked in advance for statistical smoothing) and 
time-averaging the innovation covariance based on the innovation process. This is simply the average of all the 
outer products of the innovations over this window. 

j=k-Nk 

Next, let's estimate Rk- This is done similarly. If we refer back to equation (3.1.9), we can simply calculate 
this by 

fe-1 

^^=iV^ E ^3W-H,Pj\3-iH', (4.1.2) 

j=k-Nk 

We can now use our choice of Rk along with our innovation covariance Sk to estimate Qk,k-i- Combining 
equations (3.1.6) and (3.1.9) we have 

Sk = Hk{^k,k-lPk-l\k-l^k,k-l' + Qk,k-l)Hk + Rk (4.1.3) 

Bringing all Qk,k-i terms to one side leaves us with 

HkQk,k-iHk' = Sk - Hk^kPk-i\k-i^k' Hk' - Rk (4-1-4) 
And solving for Qk,k-i gives us 

Qk,k-1 = (Hk'Hk) Hk {Sk - Hk^kPk-l\k-l ^k'Hk' -Rk)Hk{Hk'Hky (4.1.5) 

Note that it may be desirable to keep k-i diagonal if we do not believe the process noise has any cross- 
correlation. It is rare that you would expect a cross-correlation in the process noise. In addition, keeping the 
process noise diagonal has the effect of making our covariance matrix "more positive definite." This can be done 
simply by setting the off diagonal terms of Q*j. equal to 0. 

It is also important to keep in mind that we are estimating covariance matrices here which must be symmetric 
and positive semidefinite (note that the diagonal elements should always be greater than or equal to zero as these 
are variances). 
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4.2. Upper and Lower Bounds for Covariance Matrices 

We might also like to denote a minimum and maximum number we are willing to accept for each element of 
our covariance matrices. The motivation for maintaining a minimum is that we may not want to become overly 
optimistic. If the covariances drop to zero, we will assume the random variable has perfect knowledge. The 
reason for maintaining a maximum is in case we believe the covariance actually is upper bounded. Let us denote 
these matrices by 5^^", i?^^", Qf'"", 5;^^^, E^^, and Qf^. We apply these by 

^'=N^ E min(5f- max(5f",^.,;./)), (4.2.1) 

^^=N ^ E niin(i?--,max(i?f-,:.,i./-if,P,.|,_iffj)), (4.2.2) 

and, using equation (4.1.5), 

Qk = min (Q^^ max (^f", Q*k)) (4-2.3) 

Again, keep in mind that the diagonal elements of S'™'", i?™'", and Q™'" must all be greater than or equal to 
zero. This is a very simple way of lower and upper bounding these matrices. There will certainly be a number of 
ways we could approach this problem some of which might be much better at preserving the original information. 
Our hope for the application mentioned in this paper is that the bounds are rarely touched if ever. 

5. APPLICATION OF INEQUALITY CONSTRAINED ITERATIVE OPTIMIZATION 
METHODS TO A SIMULATED MINORITY GAME 

In this section, we will apply the discussed methods to a simulation of the Minority Game. In the next section, 
we will apply these methods to real financial data. 

5.1. Generating Simulation Data 

We choose parameters m = 1 for the memory size and allow two strategies per agent resulting in six overall 
combinations as described in Section 2.3. Also, we choose the time horizon size to be T = 50. We randomly 
choose a bit string of length 50 to serve as the initial time horizon, and we also randomly choose a probability 
distribution over the six possible strategy sets. We run the simulation over 150 time steps. This results in a 
returns series from which we can extract the difference series Zk- 

5.2. Predicting the Simulated Market Data 
5.2.1. Forming the Estimation Problem 

To track the difference scries Zk, we set the problem up similar to how it was generated with m = 1 giving us 
six probabilities, and we choose the time horizon size to be T = 50. 

We also make the assumption that the estimate for the probability distribution at time k will also be the 
prediction at time k + I since we hope that our estimate at time k will be well matched to the data locally. 
For the simulated case, the probability distribution is actually fixed over all k. This boils down to choosing the 
identity matrix as the transition matrix (^f;,k-i for all k in the notation of Section 3.1). 

We create the time horizon at each time step by looking back T time steps and checking where the minority 
would have lied at each time step as described near the end of Section 2.2 using the difference series Zk- If an 
element of Zk is (implying there was no minority), we simply skip this point in our time horizon. 

Finally, we score each strategy in each of our strategy sets over the time horizon similar to what we described 
in Section 2.2. This will result in a set of winning strategies. We determine the set of predictions of each 
of the winning strategies and this forms the measurement matrix Hk (it is actually a vector of ±1 since the 
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measurements are scalars). Multiplying this by the state gives us the prediction based on the probabihty of 
being in a certain strategy set and what the set would pick as its forecast. 

Notice that in most tracking appHcations, the transition matrix ^k,k-i drives the system evolution through 
time and the measurement matrix describes a fixed coordinate transform. Here changes significantly 
based on the state estimate Xi._nk-i of the system. In fact, the process noise Qk,k-i and measurement noise 
Rk, found by covariance matching techniques in our case, combined with Hk are really what drive the system 
evolution through time. 

5.2.2. Choosing the minimum and maximum acceptable covariances 

For the minimum and maximum acceptable covariances used in the covariance matching scheme of Section 4, we 

choose 



5™ = ii™ = and = iJ^"" = 1 (5-2.1) 

Note that both the measurements and the innovations must lie in [—1, 1]. This is true because in the extreme 
situations, all the strategies can choose either —1 or 1. It is also known that the variance of the distribution with 
half of its weight at a and the other half at b is given by ^^^"^ . Applying this formula gives us the maximum 
acceptable variances of 1 in equation (5.2.1). We use a similar idea to choose the minimum and maximum 
acceptable process noise. 









(5.2.2) 



and 







gmax 



.25 
.25 








.25 



(5.2.3) 



We choose .25 for the diagonal terms of Q^"^ by our previous logic since each element of the probability 
distribution must lie in [0, 1]. We force the diagonal elements to be in order to keep our covariances more 
positive definite. 

In addition, we state the following definition 



cov(x, y) = cot{x, y)ax(Ty (5.2.4) 

Noticing that the cor (a;, y) takes its most extreme values at ±1 and ct^ and cjy both take their largest values 
at we can state 

|cov(x,y)|< (5.2.5) 

We may also be interested in bounding our state prediction error and state estimate error covariances since 
we know we are estimating a probability distribution. Using equation (5.2.5) for the off diagonal terms, we can 
bound these by 
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pmin pmiii 

^k\k-l — ^k\k 



-.25 
-.25 

-.25 -.25 



-.25 
-.25 



(5.2.6) 



and 



pmax 
^k\k-l 



pmax 
^k\k 



.25 .25 
.25 .25 

.25 .25 



.25 
.25 

.25 



(5.2.7) 



We did not provide a very rigorous explanation for our choice of covariance bounds here. However, in our 
example, these are the choices we made for the reasons provided above. 

For the covariance matching, we also choose Nk in equations (4.2.1) and (4.2.2) to be equal to T = 50. Notice 

that while wc have less than 50 innovations, we choose Nk to be the number of innovations we have. When we 
have innovations (at the initial point), we choose Rk = so we can heavily trust the first measurement to 
strengthen our initialization. 

5.2.3. Initial Parameters 

We choose our initial state fo|o = gls; where s is the number of strategy sets (in our case 6) and Ig is a column 
vector of size s full of I's. This is essentially starting with a uniform distribution. We also choose our initial 
covariance Po|o = -25/8X8, where Igxa represents the s x s identity matrix. We choose .25 again for the same 
reason as before. Note that we will actually start the optimization problem at time step T +1 since we use the 
first T data points to initialize the time horizon. We also assign no process noise initially and zero measurement 
noise on the first measurement. 

Using the methods described in Section 3.4 and Section 4, we can now make predictions on this system. After 
making predictions, we need a system by which to decide which predictions to accept with greater certainty. We 
discuss this in the next section. 



5.3. Effective Forecasting 

The last question wc would like to address hero is: when is the forecast produced by this method good and how 
good? We could base this on the innovation covariance S/. which is an estimate of the errors of the innovation 
(residual) process. Note that we can either use equation (3.1.9) along with equation (3.1.6) or we can use equation 
(4.1.1) to calculate 5*^. The residual-based estimate given by equation (4.1.1) will generally provide a smoother 
function through k which might be desirable to find pockets of predictability (where we can predict well for a 
while). 

There are various ways of using Sk to decide when we would like to make a prediction. We look at a very 
simple method, where we simply take a threshold value tk such that we choose k in our set of prediction times 

if Sk < tk- 

5.4. Results of simulation 

We show the results from the simulated data in Figure 5.1. We choose the threshold value tk in this plot to 
be 10~^ for all k. Notice that in our case, tk is a scalar since the measurements are scalars. As we can see in 
the plot, we are able to make good forecasts at over 30 points (where our innovations lie within the innovation 
standard deviation). Notice that we only attempt to make forecasts at the last 100 points of the 150 generated 
data points (we use the first 50 points to generate the initial time horizon as mentioned earlier). However, we 
choose only to make a prediction at 34 of the data points. It happens to be that these 34 data points are the 
first 34 that we attempt to forecast. We have 2 bad predictions at the end of the plot. After the bad predictions, 
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we never recover to a good prediction since the covariance matching scheme drives the estimated process noise 
Qk,k-i, estimated measurement noise Rk, and estimated innovation covariance Sk up due to the large spike in 
the single residual which affects the statistical smoothing for 50 time steps. Since the covariance of the state 
remains tight and the covariance of the measurements is relatively large, new measurements aren't trusted and 
given much weight for creating forecasts. 

At the same time, because of the transient by the statistical smoothing, we continue to make forecasts 
immediately after the first false prediction. We might choose to incorporate a scheme to not make predictions 
for some length of time immediately after a false prediction to allow Qk,k-ii Rki ^^nd Sk to respond to the shock 
caused by the false prediction. Further, we might like to significantly increase the process noise after a false 
prediction to effectively cause new measurements to have a stronger weight in forming estimates. 




Figure 5.1. In tlie above plot, the solid line represents the innovation (measurement residual) process and the dashed line 
represents one standard deviation about zero based on the predicted innovation variance. We select "Prediction Times" 
in this plot as times when the data is in a predictable state based on the innovation covariance. These times need not be 
consecutive in the original data although often times are. 



6. APPLICATION OF INEQUALITY CONSTRAINED ITERATIVE OPTIMIZATION 
METHODS TO REAL FOREIGN EXCHANGE DATA 

We now apply the ideas in this paper to real financial data. We choose hourly USD/YEN foreign exchange rate 
data from 1993 to 1994 provided by Dr. J. James of Bank One in London. 

6.1. Setting up the Problem 
6.1.1. Scaling the difference series 

We first find the difference series from the returns series as before. For the time being, let's call this = r^ — r^^i. 
Since our algorithm only makes forecasts in [—1, 1], we might like to scale our inputs to this domain as best as 
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possible. Assuming that we have measurements a priori up to time step K, we estimate the scahng based on 
these measurements. Let's denote the set of all measurements up to time K by z}^ where z}^ in our case is a 
vector of scalar measurements. We choose the following method: 

Let's denote the minimum and maximum elements of vector V by the functions min(F) and max(F), respec- 
tively. And let's denote the minimum and maximum elements of z"^ by 2;^™ and z'^'^^, respectively. We first 
proportionally scale the spacing between elements of so the difference between the minimum element and the 
maximum element is 2 (the size of [—1,1]). We denote this by zl* 



24 



(6.1.1) 



Next, wo scale the elements so the minimum element is at —1. This will automatically place the maximum 
element at +1. 



Zk = zl* - (min(4*) + 1) (6.1.2) 

We do the calculation for z^'" and z^^^ once with all of the a priori information we have. Then we can use 
equations (6.1.1) and (6.1.2) to scale for any time step k. Our hope is that based on the a priori information, 
our choice of z^™ and z^'^^ will reflect the true spacing. If we know true values for these, we can use them 
instead. Notice also that we can still find the returns series from this definition for the measurements simply 
by inverting the process. 

There will certainly be a number of different ways to do this scaling as well. 
6.1.2. Forming the Estimation Problem 

For the rest of this estimation problem, we actually do the setup exactly the same as in Section 5.2 and we follow 
the effective forecasting scheme exactly as in Section 5.3 choosing threshold value tk to be 10~^ again for all k. 

6.2. Results on the Real Data 

Wc show the results from the real data in Figure 6.1. Here we had over 4000 data points. Wc chose to make 
predictions at about 100 data points of which over 90 we accept. Again these are the first data points we attempt 
to make a forecast on. And again we see some false predictions near the end. Incorporating a scheme to not 
make predictions immediately after a false prediction as mentioned in Section 5.4 would leave us with only 1 
false prediction and over 90 good predictions. 

6.3. Extension to Other Games 

Note that we chose the Minority Game as the game wc thought best exhibits the dynamics of the financial 
time-series we analyzed. The method for forecasting we describe in this paper can be used with a number of 
different models (or games). We require the following of our model and forecast scheme: 

1) We can parameterize the problem into quantities we would like to estimate iteratively. 

2) We have a way to estimate the transition dynamics for the parameters at each iteration and this function will 
always lie in the class of continuously differentiable functions. 

3) We have a way to estimate the measurement function which takes the parameter space into the measurement 
space at each iteration and this function will always lie in the class of continuously differentiable functions. 

4) We have a way to estimate the process noise and measurement noise at each iteration. 
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Prediction Times 



Figure 6.1. In the above plot, the sohd Une represents the innovation (measurement residual) process and the dashed line 
represents one standard deviation about zero based on the predicted innovation variance. We select "Prediction Times" 
in this plot as times when the data is in a predictable state based on the innovation covariance. These times need not be 
consecutive in the original data although often times are. 

7. ADDITIONAL ALGORITHM TESTS 

Here we set up some Monte Carlo runs and discuss two additional tests to check that our algorithm is working 
properly. The first is a test to check the inner workings of the algorithm, while the second checks the functional 
results that we are interested in. 

7.1. State Estimate Errors 

As a further validation to the method described in this paper, we would like to check that the state estimate 
errors (3.1.4) are tending towards a zero mean process. 

We set up the problem exactly as in Section 5 and do 400 Monte Carlo runs where our Monte Carlo space is 
the initial time horizon (2^" possibilities) and our initial distribution used for generating the simulation (infinite 
possibilities) as described in Section 5.1. For each run, we pick the time horizon uniformly at random and we 
pick the initial distribution by taking a random vector with each element chosen uniformly at random from [0,1] 
and then normalizing the vector. 

In order to determine the state estimate errors, we take as the true state the expectation of our true state 
ifg which is also our educated guess for our initial state from Section 5.2.3. Notice that the red line in Figure 
7.1 actually starts out initially with a very tight covariance. This is a result of choosing our initial state as the 
expectation of the true state (but not the true state). 
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Figure 7.1. In each of the above plots, the blue line indicates the mean innovation process with the error bars indicating 
the standard error of the mean as given over 400 runs. The red line indicates the calculated standard deviation based 
on the sample statistics centered around zero. The green line indicates the mean standard deviation calculated as the 
square root of the mean variance over the 400 runs. The black line indicated the mean standard deviation calculated as 
the mean of the standard deviations over the 400 runs. 

7.2. Innovations Process 

The other test that would be important to us is to ensure that our forecast method would give innovations that 
have mean zero also. This is a given based on Figure 7.1. We show this result in Figure 7.2. Notice that the 
parabolic nature of the green and black lines here is due to the fact that we take our first measurement to have 
no noise as mentioned in Section 5.2.3. 

Finally, in Figure 7.3, we show one more plot in which at each time step, we keep only those points which 
have Sk < ifc = 10"^. In any given time step, at most 3 points of the 400 were removed. These would correspond 
to difficult to estimate problems (such as the example we chose for Figure 5.1). We notice that removing these 
points tightens our covariances drastically as hoped. 

8. CONCLUSION 

We have shown a way to forecast time-series by parameterizing an artificial market model such as the Minority 
Game, using an iterative numerical optimization technique. In our technique we also describe how to follow an 
effective forecasting scheme which leads to pockets of predictability. 

This paper is meant only to serve as an introduction to such methods of forecasting. 
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